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ABSTRACT 

In this work we introduce a new way of binning sunspot group data with the purpose of better 
understanding the impact of the solar cycle on sunspot properties and how this defined the charac¬ 
teristics of the extended minimum of cycle 23. Our approach assumes that the statistical properties 
of sunspots are completely determined by the strength of the underlying large-scale field and have no 
additional time dependencies. We use the amplitude of the cycle at any given moment (something we 
refer to as activity level) as a proxy for the strength of this deep-seated magnetic field. 

We find that the sunspot size distribution is composed of two populations: one population of groups 
and active regions and a second population of pores and ephemeral regions. When fits are performed 
at periods of different activity level, only the statistical properties of the former population, the active 
regions, is found to vary. 

Finally, we study the relative contribution of each component (small-scale versus large-scale) to 
solar magnetism. We find that when hemispheres are treated separately, almost every one of the past 
12 solar minima reaches a point where the main contribution to magnetism comes from the small-scale 
component. However, due to asymmetries in cycle phase, this state is very rarely reached by both 
hemispheres at the same time. From this we infer that even though each hemisphere did reach the 
magnetic baseline, from a heliospheric point of view the minimum of cycle 23 was not as deep as it 
could have been. 

Subject headings: Sun: sunspots — Sun: magnetic fields — Sun: photosphere — Sun: activity 


1. INTRODUCTION 

The solar magnetic cycle is a process that takes the 
Sun through subsequent periods of high (maximum) and 
l ow (m inimum) activity. Since the pioneering work of 
Parker (1955|, there has been a continuous effort to un¬ 
derstand the mechanisms th at keep it going and define 
its properties (see a review by Charbonneau|2010). Most 
of this effort has focused on understanding the periods 
of highest activity (solar maximum). However, this focus 
shifted after the arrival of the unexpectedly deep mini¬ 
mum of solar cycle 23, in which record lows were mea¬ 
sured across the board in solar activity indices and solar 
wind properties. 

One of the most direct proxies of solar activity, and the 
one most commonly used to understand cycle variability, 
is the presence of sunspots on the photosphere. One 
of its main advantages is that reliable sunspot records 
exist that span more than a century. Sunspot groups 
are associated with bipolar magnetic regions (BMRs), 
that are believed to originate from an underlying large- 
scale toroidal field that is critical f or the evolution of the 
solar cycle (see a review by Fan 2009). Furthermore, 


thanks to the systematic orientation and tilt of BMRs, 
their emergence and decay of seem to be the primary 
mechanism that regenerates the poloiclal field from which 
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connection between sunspots and the solar cycle make it 
an ideal proxy with which to probe this question. How¬ 
ever, distilling information out of sunspot (and BMR) 
properties is non-trivial due to the large variability ob¬ 
served in their properties (area, flux, and tilt and time, 
latitude, and longitude of emergence). The main ob¬ 
jective of this paper is to introduce a new technique 
for studying the cycle dependence of sunspot properties 
(specifically sunspot group area) and demonstrate how 
powerful it is for contextualizing the extended minimum 
of cycle 23. 

In order to better explain this new approach, we be¬ 
gin with a brief overview of the traditional approaches 
for binning sunspot and BMR properties to study the 
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solar cycle (by date, by cycle, and by cycle phase; see 
Section J2]). Then we discuss why these approaches are 
sub-optimal when the object of interest is the sunspots 
themselves, and we define activity level and how to use 
it to bin sunspot data (see Section [3]. In Section [4 we 
introduce our two databases and how to combine them 
by taking advantage of their statistical properties fol¬ 
lowed by our statistical model and our methods for fit¬ 
ting the data and assessingtheir relative performance 
(see Section [5| . In Section^] we separate our data ac¬ 
cording to activity level ana fit each activity level bin 
separately. We also show how the area distribution re¬ 
lated to small structures is independent of activity level. 
In Section [ 7 ] we show how the area distribution related 
to BMRs is strongly dependent on activity level and re¬ 
define our statistical model to take advantage of this re¬ 
lationship. In Section 8] we demonstrate quantitatively, 
that an activity-level-dependent statistical model is the 
best of all the models proposed in this paper for char¬ 
acterizing our data. In Section [9j we take advantage of 
this model to better understandthe extended minimum 
of cycle 23 and put it in the context of the last 12 cy¬ 
cles. In Section [lO] we discuss whether the existence of a 
composite distribution arises from different components 
of the dynamo (small-scale versus global), or whether an 
alternative interpretation of the results is necessary. We 
finish with a summary and conclusions in Section m 

2. TIME: THE PREDOMINANT WAY OF BINNING DATA 

Given that the solar cycle is a continuously evolving 
transient process, time has been the traditional primary 
focus of statistical analyses of sunspot and BMR prop¬ 
erties. This means that comparative studies break down 
and compartmentalize data into chunks defined by when 
they occur. This kind of binning is illustrated in Fig¬ 
ures [lja) and (b), showing vertical markers that break 
down our data into separate cycles. There are essen¬ 
tially three ways of binning data in time; to illustrate 
them, in this section we make a very limited review of 
studies using each type (which should not be considered 
comprehensive). Example studies are chosen specifically 
to highlight the advantages of each approach. 

2.1. Binning Data by Arbitrary Time Interval 

The most straightforward way of time binning is to sep¬ 
arate data into specific time intervals (by day, month, or 
year or by Carrington rotation). This kind of binning is 
the most natural form of data exploration (see, for exam- 
ple, the pione ering studies by |Tang et al.| [l984; Wan g fe| 
ieeley|1989 ), and it is still widely used today. It is par¬ 
ticularly powerful for searching for evidence of long-term 
trends (spanning timescales longer than the solar cycle 
itself). More recent examples of t his kind of binning 
are the thou ght-provoking papers by Penn fe Livingston| 
(|2006[|2011|) who reported a decrease in the average mag- 
netic field of sunspots since 1998; they speculated that, 
if this trend continued, sunspots would become very rare 
by 2025. Very good examples of how best to use this 
kin d of binning are the resp onses to Penn fe Livingston 
by |Nagovitsyn et al.| ([2012]) and |Pevtsov et al.| ([2014 1 
where, using data going back to 1920, they demonstrate 
that there is no apparent long-term trend in the evolu¬ 
tion of the average properties of sunspots; there is simply 
a cyclic modulation. 


2.2. Binning Data by Cycle 

The second way of binning data involves grouping data 
according to the cycle to which they belong. This kind 
of binning is very good for identifying significant changes 
between different cycles. Some exa mp les o f this k ind of 
binning are the papers by Hathaway fe Cho udhary £20081 
who looked at the decay rate of sunspots, McGlintock fe 


Norton (2013) who studied v ariations in sunspot group 
tilt angles and Joy’s Law, and|de Toma et al. (2013), who 
examined changes in sunspot area between cycles 22 and 
23. 

This type of time binning truly excels when used for 
studying the physical mechanisms responsible for sus¬ 
taining and propagating the solar cycle. An excellent 
example of this_kind of work was performed by : Dasi-| 
Espuig et al.| (|2010|) who by looking at cyclic averages 
oFtiit angles, found a correlation between weighted av¬ 
erages of sunspot group properties during a cycle and 
the amplitude of the following cycle (not to be confused 
with the reported same-cycle negative correlation be¬ 
tween the amplitude of a cycle and tilt averages; see 
Ivanovl 120121 |McClintock fe Norton|[2013| |Dasi-Espuig| 
et al.||2013|). This resul t has b een further expanded by 
Munoz-Jaramiilo et al. (2013) who demonstrated that 
this connection exists because the average properties of 
BMRs determine the strength of the poloidal field at the 
end of the solar cycle. Together they represent observa¬ 
tional evidence in favor of the BL mechanism and our 
current understanding of the solar cycle. 

2.3. Binning Data by Cycle Phase 

The last type of time binning we will review here is 
binning according to cycle phase. This involves either 
choosing specific phases of the solar cycle (rising, max¬ 
imum, declining, and minimum phases), or binning the 
data relative to its position within a particular solar cy¬ 
cle. S ome examples of this kind of binning are the papers 
by Mathew et al. (2007) who studied the dependence 
of umbral and penumbral b rightness on the solar cycle, 
Zharkova & Zharkov (2008) who examined daily varia- 
tions of tilt ang les during the rising and declining phases 
of cycle 23; and|Watson et al. (2011) who looked at maxi¬ 
mum magnetic held and umbral/penumbral areas during 
the different phases of cycle 23. 

This type of time binning is very powerful for char¬ 
acterizing the general properties of the solar cycle. An 
ex cellent exam ple of t his kind of work was performed 
by Jiang et al. (2011), who performed a very detailed 
quantitative characterization of the relationship between 
cycle amplitude, cycle phase, and the properties of ac¬ 
tive latitudes (i.e., the shape, location, and width of the 
wings in the butterfly diagram). Taking advantage of this 
characterization, they laid a solid foundation for the con¬ 
struction of synthetic data sets based solely on sunspot 
number, which can be used to drive surface flux transport 
simulations. 

3. ACTIVITY LEVEL: A NEW WAY OF BINNING DATA 

Although there is undeniable value in using time to 
bin sunspot and BMR data (when the objective is to 
study and characterize the solar cycle), this kind of bin¬ 
ning is sub-optimal for studying how the evolution of the 
solar cycle changes the properties of BMRs and their as¬ 
sociated sunspots. The problem is conceptual: a single 
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Figure 1. (a)-(b) Temporal binning of sunspot data vs. (c)-(d) data binning by activity level. In panels (a)-(c), northern (southern) 

hemispheric data are marked using a light blue (dark red) color. In panels (c) and (d), activity level is indicated using a color scale 
changing from light yellow to dark blue. Note that the vertical and color axes are the same in panel (c). Activity level, displayed in panels 
(b) and (c), is calculated by convolving the daily sunspot group area with a six-month Gaussian filter. 


BMR and its associated sunspot group is believed to be 
the photospheric manifestation of a buoyant flux tube. 
These tubes tak e roughly a month to travel t hrough the 
convection zone (|Fan|2009; Weber et al. 20111 and have a 
typical lifetime (after eruption) ot about a month. Con¬ 
sidering that the solar cycle involves decadal timescales, 
this means that from the perspective of each BMR, 
the magnetic cycle is a quasi-static process (i.e., time- 
independent during a BMRs’ life cycle). It is commonly 


believed that the total number of BMRs and their com¬ 
bined flux is a direct indication of the strength of the 
underlying toroidal field. Furthermore, this magnetic 
field strongly determines the resul ting prope rties of the 
emerged BMRs (see a review by Fan 2009). We pro¬ 
pose that the amplitude of the solar cycle at any specific 
moment in time (which we assume to be directly related 
to the characteristics of the underlying toroidal field), is 
the true quantity determining the statistical properties of 






























































































4 


Munoz-Jaramillo et al. 


sunspots and BMRs, and that the time dependence of 
these properties is simply a direct consequence of the 
evolution of this toroidal magnetic field. 

To avoid possible misunderstandings, from now on we 
will use the term activity level to refer to the mean level 
of hemispheric solar activity at any specific moment in 
time. We do this in order to avoid confusion with a 
cycle’s peak amplitude. In order to bin our data accord¬ 
ing to activity level, we first calculate the total hemi¬ 
spheric daily sunspot area, remove high-frequency com¬ 
ponents by convolving_it wi th a six-m onth gaus sian filter 
(Hathaway 2010 Munoz-Jaramillo et al.||2013|), and use 
the result to assign an activity level to each data point. 
This kind of binning is illustrated in Figure |ljc) , show¬ 
ing horizontal lines demarcating different activity levels. 
Note that activity level is calculated separately for each 
hemisphere. It is also important to mention that we are 
assuming no intrinsic differences in sunspot and BMR 
properties across different hemispheres or across differ¬ 
ent cycles. Instead, we assume that what makes each 
cycle unique is the actual evolution of activity level in 
each hemisphere (i.e., its ups and downs). 

Figure llTc) and Figure [Hd), showing a butterfly dia¬ 
gram in which each point is colored according to activity 
level, highlight some of the subtle but important aspects 
of binning by activity level. First, we are assuming that 
activity levels and their associated statistical properties 
are not unique to any given cycle. This means that 
sunspots and BMRs that appear during the absolute 
maximum of a weak hemispheric cycle have the same 
statistical properties as those that appear at the same 
activity level in stronger hemispheric cycles. Cycle 
19, the strongest cycle ever observed, is an extreme 
example of this (as it samples the entire range of activity 
levels we have observed so far). Second, under this 
assumption, sunspots that appear simultaneously in the 
northern and southern hemispheres will have different 
statistical properties if there is a hemispheric asymmetry 
in activity level. 


4. DATA SELECTION AND TRUNCATION 

As the backbone of our analysis we use the sunspot 
group database compiled and published by the Royal 
Greenwich Observatory (RGO). This set includes heli¬ 
ographic positions and areas of sunspot groups observed 
from 1874 to 1976 by a small network of observatories: 
the Cape of Good Hope, Kodaikanal and Mauritius. The 
RGO data, covering nine solar cycles (from cycle 12 to 
cycle 20), provide the longest and most complete record 
of sunspot group areas. We extract from this database 
a single area and position for each sunspot group. We 
assign to the group the single largest reported area in 
all its days of observation. The result is a set of 30,026 
groups. Data are shown in Figure [2](a). 

Since part of our objective is to study the transition 
between sunspot cycles 23 and 24, we supplement the 
RGO data using observations taken by the Kislovodsk 
Mountain Astronomical Station (KMAS) of the Cen¬ 
tral Astronomical Observatory at Pulkovo. The KMAS 
has been in continuous operation since 1948, making it 
one of the very few institutions performing a wide array 
of solar surveys through the entirety of the space age. 
This makes it quite valuable as a connecting set between 


modern missions and previous surveys. This database 
contains 108,364 sunspot group observations taken from 
1954 February 9 to the present (covering 6.5 solar cycles, 
from cycle 18 to cycle 24), giving us a nice overlap with 
the RGO set that we can use to cross calibrate-them. As 
with the RGO set, we extract a single area and position 
for each sunspot group. We assign to the group the single 
largest reported area in all its days of observation. The 
result is a set of 19,221 groups. KMAS data are available 
at http://158.250.29.123:8000/web/Soln_Dann/ Data 
are shown in Figure p/lb). 

As recounted in aetail by [M unoz-Jaramillo et al.| 
(20151, there is a host of issues that can potentially dis¬ 
tort the statistical properties of structures near the lower 
detection threshold. They include but are not limited to 
observational bias, artificial binning caused by resolution, 
convolution of instrumental cadence and feature lifetime, 
underestimation due to the quality of the observing con¬ 
ditions, and underestimation due to excessive complex¬ 
ity in the observed phenomenon. Considering that small 
structures are also the most numerous, here we follow 
the suggestion by C. DeFo rest (20 14, pri vate com munica- 
tion) and implemented by Munoz-Jaramillo et al. (2015]), 
of imposing a truncation limit one order ot magnitude 
above the minimum size of detection in our databases. 
We only use data above this limit in our distribution fits 
and analysis. The location of these thresholds, shown 
in Figure [2] as dark horizontal lines, successfully isolates 
problematic data from the rest of each set. 


4.1. Cross-Calibration 

Our first task is to cross-calibrate the RGO and KMAS 
data sets to form a composite spanning from 1874 to the 
present. Given that the KMAS survey is still active, we 
use it as our reference set. This way we will be able to 
extend our composite database into the future as KMAS 
continues to perform observations. Since for this study it 
is important for the sets to be statistically compatible, we 
find the proportionality calibration constant by matching 
the KMAS (Fig.|2J(a)) and RGO (Fig.[4](b)) empirical size 
distribution functions. For this purpose, we use only data 
belonging to the overlapping interval between the RGO 
and KMAS data sets (i.e. between 195 4 an d 1976). This 
technique was used by Munoz-Jaramillo et al. (2015) to 
reconcile and cross-calibrate II different sunspot group, 
sunspot, and bipolar magnetic flux data sets. It involves 
the following steps: 


1. Choose a proportionality constant out of a range 
of possible values. 

2. Multiply all sunspot group areas in the RGO 
database by this proportionality constant (effec¬ 
tively shifting the empirical distribution left or 
right in logarithmic scale). 

3. Evaluate if the resulting empirical distribution 
overlaps with the reference KMAS empirical dis¬ 
tribution. 

4. Find the root mean square error (RMSE) between 
the overlaps. 

5. After trying all possible proportionality values in a 
set, identify which one minimizes the RMSE. 
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Figure 2. Logarithmic plot of sunspot group size of the (a) RGO and (b) KMAS data sets. Dashed black horizontal lines indicate the 
threshold above which data is used to calculate activity level and also fitted to the composite Weibull plus log-normal distribution. This 
threshold is set an order of magnitude above the smallest structure of each set. 
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Figure 3. (a) KMAS and (b) RGO empirical distribution functions for the period between 1954 and 1976. (c) The multiplication of RGO 
sunspot group area by a factor of 1.06 maximizes the agreement between both empirical distribution functions. Empirical distribution 
functions show all data in each set, but only data in the dark shaded region are used in the cross-calibration. 

We find that multiplying RGO data by a factor of 1.06 
maximizes the agreement between the KMAS and RGO 
empirical distribution functions (shown in Fig. 4Fc)). We 
construct our composite by using all RGO data (with 
areas multiplied by the 1.06 factor), and KMAS data 
from 1977 onward. 

5. SIZE DISTRIBUTION, DISTRIBUTION FITTING, AND 
MODEL SELECTION 

In the past, different characterizations of the size-flux 
distribution of magnetic structures have used different 
statistical distributions to fit the data: the exponential 


Weibull distribution JParnell 
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re results of Muhoz- 


Jaramillo et al. (2015) who found after an in-depth quan- 
titative comparison between the different proposed dis¬ 
tributions that a linear combination of Weibull and log¬ 
normal distributions fits observations best. This lin¬ 
ear combination is used to define the probability-density 
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RGO-KMAS Composite Data 



Area (p Hem) 



where CDF denotes the cumulative distribution func¬ 
tion associated with Equation ([!]), and Xtrunc denotes 
the limit value below which data is not used in the fit 
(see Section®. 

In order tout this PDF to the data we use maximum 
likelihood estimation (MLE). Its basic idea is to find the 
set of parameters that maximizes the likelihood of a sta¬ 
tistical model given the observed data. For this purpose, 
the user defines and maximises a likelihood function con¬ 
structed using the probability of observing all data in 
the set. This method is far superior to fitting functional 
forms to histograms because it is not sensitive to the 
details of data binning. A more detailed description of 
MLE can be found in Appendix [AT, an d in most modern 
statistics books (for example in Hoel 1984]). 


Composite Fit to RGO/KMAS data 


Weibull 
k A* 

0.48 11.14 

±0.15 ±3.98 


Log-Normal 
/i a 

5.63 0.88 

±0.13 ±0.04 


c 

0.55 

±0.10 


Table 1 

Fitting parameters of the composite distribution to RGO/KMAS 
sunspot group data. Quantities accompanied by a * are in units 
of ^zHem, and other quantities are dimensionless. The first row 
contains the fitted parameters; the second row the values of their 
95% confidence intervals. 


Figure 4. (a) Histogram using logarithmic binning of 

RGO/KMAS data, (b) Empirical PDF of RGO/KMAS data. Both 
represent different ways of looking at the same data, and both pan¬ 
els are overplotted with the same fit using a linear combination of 
Weibull (dashed blue line) and log-normal distributions (dotted 
yellow line). The composite fit is shown as a solid dark red line. 
Both panels include all data in the set, but only data shown in a 
dark shade are included in the fit. 


function (PDF) of sunspot group sizes: 


f(x\ k, A, /i, a, c) = ^ ^ Q) 


k -1 


± 


e 

(lnx-/x) 


~{x/X) k 


X(J 


V2n 


( 1 ) 


where x is the area of a given magnetic structure, k > 0 
and A > 0 are the shape and scale parameters of the 
Weibull distribution, p and a are the logarithmic mean 
and standard deviation characterizing the log-normal, 
and 0 < c < 1 is the proportionality constant that blends 
these distributions together. Note that we introduced 
a small change wit h respect to the composit e distribu¬ 


tion defined by Munoz-Jaramillo et al. 


( 2015| ): here, the 


Weibull (log-normal) term is multiplied by 1 — c (c). It is 
also important to highlight that Equation 0 is normal¬ 
ized so that its integral over the entire space is equal to 
one. This is necessary so that we can later compare the 
empirical and analytical PDFs associated with activity 
level bins that contain different amounts of data points. 

Due to the fact that we are working with truncated 
sets, we use the following truncated form of our PDF on 
our fits: 


PDF t 


= 


PDF(x) 


1 - CDF(i tru „c) ’ 


( 2 ) 


To quantify the relative improvement of our statisti¬ 
cal model by separating our data according to activ ity 
level, we use Akaike’s information criterion (AIC; Akaike; 


1983). The AIC is a powerful tool for discriminating be- 


tween different models by making an estimate of the ex¬ 
pected, relative distance between the fitted model and 
the unknown true mechanism that generated the ob¬ 
served data. It uses a combination of the likelihood of the 
data and the model’s degrees of freedom (dof) to strike 
a balance between bias and variance (i.e., between un¬ 
derfitting and overfitting). A more detailed description 
of AIC ca n be f ound in Appendix [Bl and in an excellent 
book by Burnham & Anderson (2002). 


5.1. Fitting the Composite Distribution to the Entire 
RGO/KMAS Set 

Our first task is to fit the composite PDF to our 
RGO/KMAS database. This helps us place this work in 
the light of the results of Munoz-Jaramillo et al. (20151 
and gives us a reference against which we can evaluate 
the performance of PDF fitting of data binned by activ¬ 
ity level. The results of the fit are shown in Figure [4] 
and tabulated in T able fT] They are in agree ment with 
Jar 


those found by Munoz-Jaramillo et al. (20151. 


6. FITTING THE COMPOSITE DISTRIBUTION TO THE 
BINNED RGO/KMAS SET 

After fitting our PDF to the entire data set, we now 
separate data according to activity level. An inspection 
of the empirical size PDF functions associated with dif¬ 
ferent activity levels (See Figures 0 a ) and (b)) shows 
a striking relationship between the abundance of large 
sunspot groups and activity level - with groups bigger 
than 1000 pHem being 30 times more likely to occur 
during high activity levels (e.g., the peak of cycle 19) 
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Figure 5. (a), (b) Empirical size-distribution associated with different activity levels, (d), (e) Composite PDFs fitted to data binned 

according to activity level. In order to enhance perception, we use two ways of displaying the PDFs. In (a) and (d), a solid color fills the 
area below each PDF and those associated with lower activity levels are plotted closer to the foreground. In (b) and (e), each PDF is 
plotted using a thin line colored according to activity level. Vertical dashed lines mark the limit below which data are not included in the 
fits. Panels (c), (f)-(i) show the relationship between the different fitting parameters in the composite PDF and activity level (see Equation 
0). Error bars indicate the 95% confidence intervals of each value. The Spearman’s rank correlation coefficient ( p ) and its confidence 
level (P) are included as the title of each of these panels. 


than during a typical solar minimum. Additionally, as 
can be observed in Figures [^d) and (e), the Weibull-log¬ 
normal composite is able to capture the overall shape of 
the empirical PDF in every case. 

In order to evaluate the relationship between the dif¬ 
ferent fitting parameters of the composite PDF, we use 


Spear man’s rank correlation coefficient (p; Spearman 
1904), which assesses how well the relationship between 


two variables can be described using a monotonic func¬ 
tion. The results are displayed in Figures §c), (f)-(i). 
We find no correlation between activity level and the 
parameters characterizing the Weibull component of the 
composite PDF - with p = 0.0 and p = —0.19 for the 
factor (A) and shape parameter (fc), respectively. On the 
other hand, the parameters that characterize the log¬ 
normal component are found to be correlated with ac¬ 
tivity level with a high degree of statistical significance 
(above 98%) - with p = 0.66 and p = 0.57 for the log¬ 
arithmic mean (p) and the logarithmic variance (a) re¬ 


spectively. We find a moderate correlation between the 
proportionality constant that blends these distributions 
together (c) with p = 0.37. 

7. USING COMMON WEIBULL PARAMETERS FOR ALL 
ACTIVITY LEVELS 

The apparent independence between activity level and 
the parameters characterizing the Weibull component of 
the composite PDF is in a greement with the results of 
Hagenaar et al. (2003,2008) who found essentially no de- 
pendence between the distribution of ephemeral regions 
and the solar cycle. Furthermore, considering the values 
that these parameters assume, and the large width of 
their 95% confidence intervals (see Figurespuc) an d (f)), 
it is clear that leaving them unconstrained is being used 
by our fitting algorithm to over-fit the data. 

To address this, we re-fit our data with the addi¬ 
tional constraint that the parameters characterizing the 
Weibull component must be the same for all activity 
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Figure 6. (a), (b) Empirical size distribution associated with different activity levels, (d), (e) Composite PDFs fitted to data binned 

according to activity level using the same Weibull parameters for all activity levels. Vertical dashed lines mark the limit below which data 
are not included in the fits, (c) Total average log-likelihood (Tlk) for all activity levels as a function of the Weibull parameters k and A (see 
Equation (13}). The optimum values that maximize Tlk simultaneously for all activity level bins are fcbest = 0.46 and Abest = 11.49/zHem. 
Panels (g)-p) show the relationship between the remaining parameters in the composite PDF and activity level. Error bars indicate the 
95% confidence intervals of each value. The Spearman’s rank correlation coefficient (p) and its confidence level (P) are included as the title 
of each of these panels. Fits to the relationships between these parameters and activity level are shown as red dashed lines. The analytical 
expression for each fit is included in the legend of each panel. 


levels. We do this by maximizing total average log- 
likelihood (Tlk) for all activity levels: 

JVbins , 

Tlk(M)= E ( 3 ) 

j —1 J 1=1 

where f(x] k, Au, a, c) is our composite PDF function 
(see Equation Q); the index j denotes each activity level 
bin, the index * "denotes each data point in a bin; k and A 
(the Weibull parameters) are free to vary but must be the 
same for all activity level bins; and /ij, aj , and Cj (the log¬ 
normal parameters and the constant of proportionality) 
are allowed be different for different bins. 

As can be seen in Figure [6jc) , Tlk has a single 
global maximum located at fcbest = 0.46 and Abest = 
11.19/iHem. These values are well within the 95% confi¬ 
dence intervals previously found for k and A in both the 
unconstrained fit (see Figures [5fc) and (f)), and the fit 


to the unbinned RGO/KMAS Set (see Table [I]). 

After forcing k and A to have the same value for all 
activity level bins, there is a remarkable tightening of 
the relationship between activity level and the remain¬ 
ing PDF parameters (//., a, and c, which can be seen 
both qualitatively and as a significant improvement in 
the Spearman’s rank correlation coefficients. 

We perform a x 2 fit to this dependence using power 
functions (see Figures[6](g)-(i) for fitting values), finding a 
reduced x 2 lower than unity in all cases. Although in this 
work we fit these dependencies using power functions, 
due to their simplicity there are several functional forms 
that would fit the scatter plots equally well within the 
95% confidence intervals (for example logarithmic and 
exponential forms). The true characterization of these 
dependencies would involve a large amount of tests that 
is beyond the scope of this paper and will be performed 
in a later work. 
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Quantification of Distribution Performance Using AIC 


Fit Characteristics 

Description 

Log-Likelihood 

Degrees of Freedom 

A f IC 

Aw 

No dependence on activity level 

Section 5.1 

-2.097x10 s 

5 

3,779 

<0.001 

Unconstrained, binned by activity level 

SectionT 


-2.093x10 s 

85 

3,063 

<0.001 

Constrained, binned by activity level 

Section * 


-2.093x10 s 

53 

3,033 

<0.001 

No binning, analytical dependence on activity level 

Equation 

(4| 

-2.078x10 s 

9 

0 

>0.999 


Table 2 

Comparative performance of the different ways of fitting the data presented in this paper. AA ic is the relative AIC difference described 
by Equation (B2 I. Aw is the Akaike weight described by Equation iB4l. The lower AA ic is, the more a model is likely to be correct 
(quantified using Aw). Bold text indicates the best model according to AIC. 


Nevertheless, using these results, one can define a PDF 
with constant k and A, whose properties depend on activ¬ 
ity level through the relationships shown in Figures |6^g)- 
(i), and in which binning by activity level is no longer 
necessary. This PDF is defined as: 

f[x-,k, X,n(AL),a(AL),c(AL)] = 

il~ c(AL)]k ^x^- 1 e -{ x /x) k 

C(AL) — (lng-p(A£)) 2 

_|___ L _g 2 cr 2 (AL) 

xa(AL)\/2Tr 

Weibull Log-normal 

k = 0.46 n{AL) = 5.97AL 0 021 

A = 11.49/iHem a = 1.08AL 0117 

Proportionality Constant 

c{AL) = 0.66AL 0230 

where AL is the activity level in mHem at the day each 
sunspot group was observed. 


8. QUANTITATIVE MODEL SELECTION 

Now that we have characterized the dependence of the 
size-flux PDF on activity level, our task is to quantita¬ 
tively identify the best of all the models that have been 
used so far. These are: 


1. The same PD F irrespective of activity l evel (see 
Munoz-Jaramillo et al. [2015 and Section 5.1). 


2. A different PDF for each activity level bin (see Sec¬ 
tion [6j). 


3. A different PDF for each activity level bin, but in 
which the values of k and A are forced to be equal 
for every bin (see Section [7]). 


4. A PDF with constant k and A, but in which /r, 
er, and c depend on activity level through power 
functions (see Equation Q). 


For this purpose, we use the AIC (described in detail 
in Appendix |Bl) , and the results are shown in Table [2j 
A comparisonbetween log-likelihood and dof (columns 
3 and 4, respectively), shows that log-likelihood is the 
main factor determining AIC (not the dof; see Equation 
(Bl|). The reason is that our data set has significantly 
more data points than the dof in each of our models. 

As expected, the worst fit corresponds to the PDF 
that does not depend on activity level (defined in Section 


|5.1| ). This is followed by both our constrained and uncon¬ 
strained fits binned by activity level (defined in Sections 
[6] and [7j respectively). We find, with a very high degree 
of statistical significance, that the best model to fit our 
data is the unbinned PDF whose parameters depend an¬ 
alytically on activity level (defined by Equation Q). It 
is important to highlight that AIC works only as a rela¬ 
tive estimate. This means that it cannot tell us whether 
Equation |4| characterizes the true mechanism that gen¬ 
erated the observed data. Instead, it allows us to rule 
out all the models proposed in this work (with near cer¬ 
tainty) in favor of the model descr ibed b y Equati on _ffl. 
Taken together with the results of Munoz-Jaramillo etal. 
(2015| who fitted five more models to sim ilar data (in¬ 
cluding the model described in Section 5.1), we can take 
advantage of the relative nature of AlC to rule those 
models out as well. 


9. THE SOLAR MAGNETIC FLOOR AND THE TIME 
DEPENDENCE OF SUNSPOT PROPERTIES 

As was shown in Section [5Tj we find the Weibull com¬ 
ponent of the composite PDF to be independent of ac¬ 
tivity level. This result is in agreement with oth er st ud¬ 
ies arguing in favor of a magnetic baseline (Svalgaard 
& Cliver 2007 Schrijver et al. 2011 Cliver 20121, since 
it suggests that small magnetic structures in the pho¬ 
tosphere indeed arise from a cycle-independent process. 
However, the striking connection between the size-flux 
PDF and activity level (demonstrated in Section [ 7 ]), al¬ 
lows us to do more than that; it allows us make a quanti¬ 
tative comparison of the depth of each minimum during 
the last 130 yr (12 solar cycles). For this purpose, we 
calculate the expected value of sunspot group area (i.e., 
the average size of magnetic structures) as a function of 
activity level: 

/»oo 

E {f(AL)]= / xf[x; k, A, fi(AL), a(AL), c(AL)\dx 

Jo 


= [1 - c(AL)]\T ( 1 + - 


ML) 




(4) 

where T is the gamma function. This quantity can be 
used as a thermometer for solar magnetism, as it tells us 
the typical magnetic structure size that we can expect to 
see as a function of time. 

As it happens with the composite PDF, the expected 
value of the sunspot group area is also a linear combi¬ 
nation of Weibull (second line. Equation Q) and log¬ 
normal (third line, Equation (14])) components. Further- 
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Figure 7. (a) Total daily sunspot area smoothed using a six-month Gaussian filter; this is the quantity we use to define activity level, 

(b) Log-normal and Weibull contributions to the expected sunspot group area. Log-normal components are calculated separately for each 
hemisphere; the Weibull component is shown using a black dotted line. For both panels, the northern (southern) hemisphere is denoted by 
a solid light blue (dashed dark red) line, (c) Expected value of sunspot group area for the whole Sun (solid green line). For reference, the 
Weibull component is shown as well. 


more, almost all of the time dependence of the expected 
value can be attributed to the log-normal component. 
The consequence is that the magnetic baseline is de¬ 
fined by the Weibull component and is only truly visible 
in those times in which the emergence of cycle-related 
BMRs shuts down. 

Figure[?](b) shows the time evolution of the log-normal 
component of the expected sunspot area and how it con¬ 
trasts with its Weibull counterpart. Note that there is 
a weak modulation of the Weibull contribution to the 
expected value due to the fact that the proportional¬ 
ity constant (c) by definition depends on activity level. 
We find that the log-normal contribution drops below its 
Weibull counterpart in most hemispheric minima. Using 
a notation where n .5 corresponds to the minimum be¬ 
tween cycle n and cycle n+1, the exceptions are 15.5N&S, 
17N&S," 19.5N, 20.5N&S, 21.5N&S, and 22.5N&S (note 
that most exceptions occur during the space age). This 
means that, from a hemispheric point of view, we have 
been able to observe that magnetic baseline. However, 
the results are different from a whole Sun point of view. 


We calculate the whole Sun expected value as a weighted 
average between the expected value of the northern and 
southern hemispheres. Due to its strong correlation with 
sunspot numbers, for simplicity we use activity level as 
our weighting coefficient: 


AL n E n -I- ALgEg 
E “" S = AL„ + ALs 


( 5 ) 


where Ews, Ejv, and Eg are the whole Sun, northern, 
and southern expected values, respectively, and ALjv 
(AL g) is the activity level in the northern (southern) 
hemisphere. 

Figure [Tic) shows the time evolution of the expected 
value for the whole Sun (including both log-normal and 
Weibull components) and the Weibull baseline for com¬ 
parison. The story is quite different; there have been only 
two solar minima during the last f2 cycles in which ob¬ 
servation of the baseline magnetism has been possible in 
both hemispheres simultaneously (i.e., for the whole Sun; 
13.5 and 14.5). Apart from those, hemispheric asymme¬ 
tries have conspired to raise the whole Sun level above 
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the baseline magnetism. 

One of the striking features evidenced in Figure [T^c) is 
how different the minimum of cycle 23 is when compared 
to other minima during the space age (19.5 through 23.5). 
It is no wonder why it seemed so unusual to us. This 
has important implications because the properties of the 
heliosphere, its current sheet, and the background solar 
wind are strongly determined at a global level. The fact 
that the minimum of cycle 23 had a very asymmetric 
current sheet is evidence that solar magnetism was not in 
its baseline state. Based on this result, we infer that even 
though each hemisphere did reach the magnetic baseline, 
the minimum of cycle 23 was not as deep as it could have 
been. 


10. SMALL-SCALE VERSUS GLOBAL DYNAMO? 


In our previous paper (Munoz-Jaramillo et al. 2015), 
we proposed that the existence of two populations of 
sunspot groups originated from separate contributions 
by the small-scale and global components of the dynamo. 
However, with the evolution of our understanding we feel 
that further clarification is necessary. We find strong evi¬ 
dence that the size-flux distribution has two components 
that are discriminated by size, and only the properties 
of the larger structures are modulated by the solar cy¬ 
cle. However, in spite of the fact that the properties 
of small pores are independent of the solar cycle, their 
latitude of appearance is still modulated by active lati¬ 
tudes (as can be seen in Figure 0 ). This means that 
there must still be a connection between these structures 
and the cycle itself and that their formation cannot be 
attributed solely to the small-scale component of the dy¬ 
namo (otherwise they would be observable all throughout 
the photosphere). 

A possible explanation is that these small structures 
arise from the re-processing of the decaying magnetic 
field of their large-scale counterparts and thus their rel¬ 
ative numbers are governed by the properties of surface 
convection (and only loosely by the amount of available 
decaying field). In this case, the magnetic baseline found 
in the previous section is contingent on the emergence 
of a minimum amount of large-scale structures and can¬ 
not be taken as a hard lower limit for grand minima like 
the Maunder minimum. Unfortunately, it is difficult to 
make a further assessment of this connection without re¬ 
sorting to magnetic data. For this reason, we will look 
at this issue in more detail in future work involving mag¬ 
netic structure catalogs compiled using SOHO/MDI and 
SDO/HMI data. 


11. SUMMARY AND CONCLUDING REMARKS 

In this work we have introduced a new way of binning 
sunspot group and BMR data with the purpose of better 
understanding the impact of the solar cycle on sunspot 
and BMR properties and how this defines the characteris¬ 
tics of the extended minimum of cycle 23. This approach 
hinges critically on our current understanding of BMRs 
as the photospheric manifestation of emergent buoyant 
flux tubes arising from a large-scale underlying toroidal 
field. In particular, we assume that from the point of 
view of each active region, the solar cycle can be approx¬ 
imated as a quasi-static process. This means that the 
properties of sunspots and BMRs are completely deter¬ 
mined by the strength of the underlying toroidal field 


and have no additional long-term dependencies. In other 
words, we are assuming that the statistical properties of 
sunspots and BMRs do not depend on cycle phase (ris¬ 
ing versus declining; maximum versus minimum), but on 
how strong the cycle is at each particular moment (some¬ 
thing we refer to as activity level). 

In this work we b uild upon the results of |Muhoz-| 


Jaramillo et al. (2015), who found after analyzing 11 dif¬ 
ferent databases, that the solar size-flux distribution is 
better characterized by a linear combination of Weibull 
and log-normal distributions - where a pure Weibull (log¬ 
normal) characterizes the distribution of structures with 
fluxes below (above) 10 21 Mx (10 22 Mx). After binning 
our data according to activity level, we fit this composite 
distribution to each separate bin and look at the depen¬ 
dence of each parameter on activity level. 

We find that the parameters that characterize the 
Weibull component have no dependence on a ctivity 
level. Thi s is in agree ment with the results of Hagc- 
naar et al.| ((2003 2008) who found essentially no depen¬ 
dence between the distribution of ephemeral regions (be¬ 
low 10 20 Mx) and the solar cycle. We propose that the 
structures characterized by the Weibull component are 
what give the Sun a magnetic baseline. 

In stark contrast to the Weibull component, we find 
a clear dependence between activity level and the pa¬ 
rameters that characterize the log-normal component 
of the size-flux distribution (which we fit usin g power 


functions). Th i s supp orts the interpretation of Munoz- 


nis supp 

Jaramillo et al. (2015|), who proposed that the log-normal 
component is directly connected to the global component 
of the dynamo (and the generation of bipolar active re¬ 
gions). 

By taking advantage of our analytical characterization 
of the size-flux distribution and its dependence on ac¬ 
tivity level, we study the relative contribution of each 
component (small-scale versus large-scale) to solar mag¬ 
netism. In order to do this, we calculate the expected 
value of sunspot group areas and study its evolution with 
time. We find that from a hemispheric point of view, 
almost every solar minimum (during the last 12 cycles) 
reaches a point where the only contribution to magnetism 
comes from the small-scale component. However, due 
to asymmetries in cycle phase, this state is very rarely 
reached by both hemispheres at the same time (accord¬ 
ing to our data, only during the minima of cycles 13 and 
14). There is no question that the extended minimum of 
cycle 23 is deeper than any other minimum of the space 
age. However, based on our results, we infer that even 
though each hemisphere did reach the magnetic baseline, 
from a heliospheric point of view the minimum of cycle 
23 was not as deep as it could possibly be. 
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APPENDIX 

A. MAXIMUM LIKELIHOOD ESTIMATION 


The idea behind MLE, is to find the set of parameters that maximizes the likelihood of a statistical model M given 
the observed data D = {D\ : D2 1 ...,D n } by maximizing the likelihood (L) function: 


L(M) oc pi(D\M ) = JJpr(Di|M). 


(Al) 


This process of maximization is typically performed by first taking the logarithm of both sides of Equation (Al I, and 
maximizing the resulting log-likelihood (lk) function: 


lk(M)=^log(pr(A|M)). 


(A2) 


B. AKAIKE’S INFORMATION CRITERION 


The AIC for a model Mj is defined as: 


AICj = —2 lk(My) -2 n 


’31 


(Bl) 


where lk (Mj) is the log-likelihood of model Mj (see Equation (A2)) and rij is the number of parameters of model j. 
The model with the minimum AIC is chosen as the best. In a sense, by minimizing the AIC one is looking for the model 
with the largest log-likelihood. However, log-likelihood alone is not sufficient to di scriminate bet ween models because 
it is biased as an estimation of the model selection target. This bias was found by Akaike (1983) to be app roxim ately 
equal to each model’s number of parameters (n) and thus the presence of the second term in Equation (Bl). The 
significance of the AIC is strongly dependent on an appropriate choice of models. Applying the AIC to a set of very 
poor models will always select one estimated to be the best (even though that model may still be poor in an absolute 
sense). 

The relative nature of the AIC is better represented by calculating the relative AIC differences: 


Aj = AICj - min(AIC). 

This in turn can be used to estimate the likelihood of a model given the data: 


C(Mj | D) oc exp ( — 


Af IC \ 




and use it to calculate the Akaike weights: 


Awj = 


exp - 


(-*£) 


Ef=i exp (- 




(B2) 


(B3) 


(B4) 


which are a measure of the probability that the model Mj is the best model given the data. 
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